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Abstract 

The fundamental task of group testing is to recover a small distinguished subset of items from a 
large group while efficiently reducing the total number of tests (measurements). The key contribution 
of this paper is in adopting a new information-theoretic perspective on group testing problems. Estab- 
lishing its connection to Shannon-coding theory, we formulate the group testing problem as a channel 
coding/decoding problem and derive a unifying result that reduces many of the interesting questions 
to computation of a mutual information expression. First, we derive an achievable bound on the total 
number of tests using random constructions and typical set decoding. This result is fairly general; it 
allows us to verify existing bounds for some of the known scenarios and extend the analysis to many 
new interesting setups including noisy versions of group testing. We obtain tradeoffs between the number 
of tests, sparsity and the group size for a wide range of models. Among the models we consider are the 
deterministic noise-free case, approximate reconstruction with bounded distortion, additive measurement 
noise, and dilution models. Then, we show that the achievable bound provides fairly tight asymptotic 
scaling by deriving an information theoretic lower-bound using Fano's inequality. 

I. Introduction 

Group testing has been effectively used in numerous applications. It was originally proposed during 
World War II to reduce the total number of blood tests required to detect soldiers with Syphilis [2], [3]. 
Instead of conducting a separate blood test for each and every soldier, the idea was to pool blood samples 
from many soldiers and test them simultaneously. Group testing has been used in general in biology for 
screening libraries of DNA clones (strings of DNA sequence) of the human genome and for screening 
blood for diseases. Other applications include quality control for detecting defective parts in production 
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lines, data forensics to test collections of documents by applying one-way hash functions, computer fault 
diagnosis, and contention algorithms in multiple access communications. Recently, group testing methods 
have also been applied to spectrum enforcement in cognitive radios [4], [5]. 

In the basic group testing problem [2], we are given a population of iV items. Among them, at most 
K items, also called defectives, are of interest. The set of defectives is denoted by Q. Associated with 
the group testing problem is a binary matrix M known as the measurement matrix. This matrix defines 
the assignment of each of the items to different pools or collections. The entry is 1 if the j-th item 
is contained in the i-th pool and otherwise. A test conducted on the pool is positive if there is at least 
one item belonging to the pool which is also an element of Q, and is negative otherwise. A measurement 
matrix for a non-adaptive group testing algorithm is a T x N matrix where the rows correspond to pools 
of items and N the number of items. Each item is manifested as a vector codeword of length T. If K 
items are defective, then the T tests are a boolean sum of K columns of the measurement matrix. The 
goal is to construct a pooling design to recover the defective set while reducing the required number of 
tests. Group testing is related in spirit to compressed sensing(CS). In CS we are given an N-dimensional 
sparse signal with support size K. Random projections of the sparse signal are obtained. The goal is to 
identify the support set while minimizing the number of projections. 

While the degradation of CS with noise has been characterized (see [6] and references therein), the 
noisy group testing problem, the main focus of this paper, has not been systematically studied. Our 
attention in this paper is on the so called non-adaptive testing problem [2], where the measurement 
matrix is formed prior to performing the tests. 

A significant part of the existing research on group testing is focused on combinatorial pool design 
(i.e. construction of measurement matrices) to guarantee the detection of the items of interest using a 
small number of tests. Two types of matrix constructions have been considered. Disjunct matrices [2] 
satisfy the so called covering propertjQ. In the context of group testing this property implies that a 
test pattern obtained by taking any K columns of the measurement matrix does not cover any other 
boolean sum of K or smaller number of columns. Matrices that satisfy this property are often referred 
to as superimposed codes and combinatorial constructions were extensively developed by [7], [8], [9]. 
Superimposed codes are not only desirable because they ensure identifiability but they also lead to efficient 
decoding. Separability [2] is a weaker notion that is also often employed. A separable matrix ensures 
that the boolean sum of K columns are all distinct, which ensures identifiability. Uniquely decipherable 



1 We say that a column x is covered by a column y iff x V y = y. 



codes [8], [7] are codes that guarantee that every boolean sum of K or smaller number of columns are 
distinct. Recently, it has been shown [10] that all of these notions are equivalent upto a scaling factor on 
the number of tests T. 

A different approach to group testing based on probabilistic method has also been advocated by several 
researchers [11], [12], [13], [14], [15]. Dyachkov and Rykov [11] and Ruszinko [12] developed upper and 
lower bounds on the number of rows T for a matrix to be .ff -disjunct (bound on length of superimposed 
codes). Random designs were used to compute upper bounds on the lengths of superimposed codes by 
investigating when randomly generated matrices have the desired covering/separability properties. They 
showed that for N — > oo and K — > oo, T > 0{ K ^°^ ) for exact reconstruction with worst-case 
input. Sebo [13] investigated average error probabilities and showed that for an arbitrarily small error 
probability, a randomly generated matrix will be if -disjunct if T = 0(K log N) as N — > oo. Recently 
Berger et al. [14] proved upper and lower bounds on the number of tests for two-stage disjunctive testing. 
Approximate reconstruction [15], whereby a fraction a of the defective items are allowed to be in error, 
has been described. Again the number of tests here has been shown to scale as T = 0(K log N) as 
N — > oo. 

While these approaches have generally characterized fundamental tradeoffs for noiseless group testing, 
the noisy counterpart of group testing has not been systematically addressed. In this paper we take a novel 
information theoretic approach to group testing problems. The common approach to previous related work 
was to prove bounds on the size of randomly generated matrices to exhibit the aforementioned separability 
and covering properties. In contrast, we formulate the problem as a detection problem and establish its 
connection to Shannon coding theory [16] which, to the best of our knowledge, is explored in this paper 
for the first time. While there exists a one-to-one mapping between both formulations, the new perspective 
allows us to easily obtain results for a wide range of models including noisy versions of group testing. 
Our approach, which is fairly general, is to map the group testing problem to a corresponding channel 
model which allows the computation of simple mutual information expressions to derive achievable 
bounds on the required number of tests. This result allows us to verify existing bounds for some of 
the known scenarios and extend the analysis to many new interesting setups including noisy versions of 
group testing. 

One major contribution of this paper is to develop tools to analyze long standing noisy versions of the 
group testing problem. In particular we consider two models: The dilution model and the additive model. 
• Additive model: False alarms could arise from errors in some of the screening tests. This is manifested 
when some tests are erroneously positive. 
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TABLE I 

This table summarizes the scaling results for the various models considered in the paper. In particular, 

it shows the required number of tests t as a function of the size of the defective set k, the total 
number of items at, and the model parameters for the noiseless, additive noise, and dilution models for 

both the average error and worst-case error criteria. 



• Dilution model: Even though a positive item is contained in a given pool, the test's outcome could 
be negative if the item's presence gets diluted for that specific test. For example, in blood testing 
the positive sample might get diluted in one or more tests leading to potential misses of infected 
blood samples. To account for such a case we analyze the group testing problem where some of the 
positive entries get flipped into zeros with a given probability u. 
Table [JJ summarizes the scaling results we obtained for the various models considered in the paper for 
the average error and worst case error criteria. 

The rest of the paper is organized as follows. Section [II] describes the problem setup. The main 
achievable result mapping the problem to mutual information expression is provided in Section [Till 
Section [TV] considers the noise free (deterministic) version of the problem with average and worst case 
errors. Approximate reconstruction with distortion is investigated in Section [V] In Section [VTJ we consider 
different noisy models with additive and dilution effects. In Section IViTI we prove a converse bound using 
Fano's inequality. Finally, we provide conclusions in Section IVIIII 

II. Problem setup 

Among a population of N items, K unknown items are of interest. The collection of these K items 
represents the defective set. The goal is to construct a pooling design, i.e. a collection of tests, to recover 
the defective set while reducing the number of required tests. The idea is illustrated in Fig. [T] In this 
example, the defective set Q = {2}, i.e., K = 1, since only the second item is defective. The shown 
binary valued matrix, represents the measurement matrix (transposed) defining the assignment of items to 



tests. The entry is 1 if the item is a member of the designated test and otherwise. At the bottom of the 
figure we highlight the positive tests. The outcome of a test is positive if and only if the second item is 
a member of that test. Observing the output for a number of tests T, the goal is to recover the defective 
set Q. While in combinatorial group testing the goal is to find the defective set for the worst-case input, 
probabilistic group testing requires the average error to be small. Both formulations are considered in 
this paper in subsequent sections. We assume that the item-test assignment is generated randomly. Before 
we provide our main result we define the following notation: 

• N: Total number of items 

• K: Known number of defectives (or positive items) 

• p: Probability that an item is part of a given test 

• T: Total number of tests 

• Xj: for the j-th item, Xj is a binary vector € {0, 1} T , with the t-th entry Xj(t) = 1 if the j-th 
item is pooled in test t, and otherwise. Following information theoretic convention, we call it the 
j-th codeword since we will be using an information theoretic framework. Upper-case letters are 
used to denote random variables and vectors, and lower-case letters for denoting their realizations, 
e.g. Xj is a random vector and Xj a realization of this random vector. 

• Y T is the binary observation vector of length T, with entries = 1 for the tests with positive outcome. 
Similarly, y denotes a realization of Y. 

• Sg, C {1, 2 . . . N} denotes a subset of items of size £. 

• Xg e denotes the collection of codewords of length T corresponding to the items in St. It is a matrix 
of size t x T. We drop the T when we refer to an instance, i.e., a vector of code symbols. To avoid 
cumbersome notation, for the rest of the paper we use the shorthand notation XrA instead of Xs^ 
It should be of no confusion that we are referring to the codewords of a specific subset of items. 

• C: The measurement matrix, or the codebook, i.e., a collection of N codeowords defining the pool 
design, that is, the assignment of items to tests. 

Noise-free case: 

For the noise-free case, the outcome of the tests Y is deterministic. It is the boolean sum of the codewords 
corresponding to the defective set Q. In other words: 



y = yxi. 



(1) 



codeword of item 1 (row in this figure) 
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Fig. 1. The shown binary structure defines the assignment of items to tests. Rows are codewords for corresponding items, and 
columns correspond to pools of items (tests). The entry is 1 if the item is a member of the designated test and otherwise. In 
the shown example, the defective set Q — {2}, i.e. only the second item is defective. At the bottom of the figure we show the 
positive tests. The outcome of a test is positive if and only if the second item is a member of that test. Observing the output 
for a number of tests T, the hope is to recover the defective set Q. 



Alternatively, if Ri G {0, 1} is an indicator function for the i-th item determining whether it belongs 
to the defective set, i.e., Ri = 1 if i € Q and Ri = otherwise, then the outcome Yt of the £-th test in 
the noise-free case can be written as: 

N 

Y t = V X ti Ri (2) 
i=i 

where Xu is the binary entry at cell (t, i) of the measurement matrix X. 
Noisy cases: 

In this paper we also consider two noisy models, the additive model and the dilution model. 

• Additive Model: In this model we account for false alarms in the outcome of pooling tests. The 
outcome of a test can still be 1 even if no positive items are pooled in that test. This effect is 
captured by adding independent bernoulli(q) random variables Wt to the outcome of the i-th test of 
the noise-free model in Eq.©, i.e.,: 

N 

Y t = {\l X ti Ri) VW t (3) 

i=l 

where W t ~ B{q),t = 1 . . . T. 



• Dilution Model: Positive items taking part in a given test might probabilistically behave as absent 
(diluted). If all positive items in a given test appear as absent, that could potentially lead to 
erroneously zero outcomes. This model is motivated by blood dilution due to pooling with other 
negative tests or imperfectly diluted blood samples. This effect is captured by the Z-channel model 
of Fig. |2] The outcome of the t-th test can be written as: 

N 

Y t = \J Z(X u Ri) (4) 
i=i 

where Z represents the Z-channel model of Fig. |2] 




Fig. 2. Dilution channel (Z-channel): positive items taking part in a given test might probabilistically behave as absent (diluted). 
In other words, even though a positive item is contained in a given pool, the test's outcome could be negative if the item's 
presence gets diluted for that specific test. For example, in blood testing the positive sample might get diluted in one or more 
tests leading to potential misses of infected blood samples. 

Note that in the additive model, the outcome of testing a pool with no defective items might be erroneously 
positive, i.e., false positives would occur. On the other hand, the membership of a defective item in a 
given test might go unnoticed in the dilution model. If all defectives appearing in a given pool are diluted, 
a false negative occurs. The effect of dilution on the increase in number of tests is expected to be more 
severe than the effect of additive noise. This is explained by the fact that tests with negative outcomes 
are generally more informative; while a truly positive test merely indicates that at least one defective is 
present in the pool, a truly negative test exonerates all the members pooled in that test. With additive 
noise, a test with a negative outcome is never erroneous. In contrast, dilution diminishes our confidence 
in pools with negative outcomes, since a seemingly negative outcome would not necessarily mean all 
pooled members are perfect. Intuitively speaking, additive noise can be potentially mitigated by repetition 
of tests whereas the dilution effect is more intricate to resolve. This intuition is verified by the results 
we obtained through theoretical analysis as shown in the next sections. 



For notation, random vectors and their realizations are denoted by upper- and lower-case letters, 
respectively. Index collections of codewords of size K by V, where V £ {1,2,... (^)}. Defining a 
decoding function <?(.) : Y T — > V, g(.) maps the outcome of the tests Y of length T to an index V 
corresponding to a specific set of defectives. Now define the error X v a.si 

\ v = Y,p(y T \ x v)i(g(y T )^v\v) (5) 

y T 

where I is an indicator function. A^ is the probability of error conditioned on a given v, i.e. the probability 
that the decoded set is different from the true defective set. Note that for the deterministic noise-free 
case this simplifies to A„ = 2(g(y T ) ^ v\v). Averaging over all possible inputs v, we define the average 
error A as: 

\k) v 

For the aforementioned models (noiseless and noisy), we prove achievable and converse bounds on 
the total number of tests T as N, K — > oo. Namely, we consider the following criteria: 

• Criteria 1: Average error probability: 

Kvg = "TTvT ^2 K> 
\K) v 

• Criteria 2: Worst-case error probability: 

Amax = max A„ 

v 

• Criteria 3: Reconstruction with distortion: 

In this case, we are satisfied with approximate reconstruction of the defective set. Let d be a distance 
function between the decoded set g(y T ) and the true defective set v such that d{g{y T ),v) is equal 
to the number of misses. Given K declared candidates, an error occurs only if the number of missed 
items is greater than aK, i.e., 

\ v =l{d{g(y T ),v) >aK\v) 

In the following section we derive our main result. We prove an achievable bound on the number of 
tests T where T(N, K) is said to be achievable if \/5 > there exists a pool design of size T x N 
such that A < 5, where A is defined according to the performance criteria defined above. The bound is 
fairly general as it applies to the noise-free and the noisy versions of the problem as we elaborate in the 
following sections. 



III. Main Result: Achievable bound 

To derive an achievable bound on the number of tests T, we show how the group testing problem can 
be mapped to an equivalent channel model. Then, using random coding and typical set decoding [16] 
we upper bound the error probability, i.e., the probability of misclassifying the defective set. 

Random matrix generation and the encoding process: 

The binary measurement matrix is generated i.i.d. according to a bernoulli distribution. Associated with 
each item is a codeword that represents its assignment to tests. The defective set corresponds to a collection 
of K such codewords. Denote a collection of K codewords by XT K ^ which is T independent copies of 
a collection of vectors -XVjn • A collection of K codewords and the observed y T have a joint distribution 

T 

p( xT {Kyy T ) = J\p{x{K){t))p{y{t)\x {K ){t)) 
t=l 

Before we describe our decoder, we define joint typicality in the context of this group testing problem. 

Definition 3.1: A collection of K codewords (matrix of size K x T) and tests' outcomes y T are said 
to be jointly typical if they belong the typical set A, which is properly defined in Eq@3]for the noiseless 
and additive noise models, and Eq. [49] for the dilution model. 

Decoder: 

Decoding is achieved using typical set decoding [16]. The decoder goes through all (^) possible sets 
of size K, where K is the size of the defective set, until it finds a set such that (xJ K yy T ) are jointly 
typical (Definition 13.1b . An error occurs if the generating set of codewords (corresponding to the true 
defective set) and the output y are not jointly typical, or if any set, other than the positive set, and y are 
jointly typical. 

Analysis of the probability of error: 

Given the random codebook generation let P e denote the average probability of error, averaged over all 
codebooks and over all sets of size K, i.e.,: 



5>(C)A(C) 
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C \K) v 



KK) v 



(7) 



By symmetry of the codebook construction, ^ c P(C)X V (C) does not depend on the set v. Thus, 



In other words, the average error probability does not depend on the input v due to averaging over 

randomly generated codebooks. 

Now define the error event Ei as follows: 

Ei. The event that a set which differs in exactly i items from the true set is jointly typical with y. The 
probability of such an event is denoted P(Ei). 

Using the union bound, the average probability of error can now be bounded as: 

K 

P e = P e]v < P(E C ) + P(Ei) (9) 

i=l 

The first term corresponds to the event that the output y and the input to the channel (i.e. the codewords 
of the true defective set) are not jointly typical. In the appendix, given proper definition of typicality, 
we show that the probability of this event goes to zero fast for sufficiently large T and is dominated by 
the probabilities of the Ei events for all the models considered in this paper. See the appendix for the 
proof of an upper bound on the probability of the atypical event for the noise free, additive, and dilution 
models which we state in lemmas 14- 1 L 16.11 and 16.31 respectively. 

We are interested in determining the required number of tests T to achieve an arbitrarily small error 
probability for large N and K with K « N and we choose p = ^. 
Before we bound the sum in Eq.© we prove the following lemma: 

Lemma 3.1: If X^) and Y are jointly distributed as p(x(K),y) and XT K ^ = (ILlLj is a set 
of K codewords formed by replacing i of the codewords of the true defective set with i independent 
codewords (hence also independent of Y T ) and keeping the remaining K — i codewords (with the same 
marginals), then the probability that (lL,7 T ) is jointly typical can be bounded as: 



symmetry of the codebook construction, the mutual information expression above depends only on the 
number of items i and not on which i items. 




(8) 
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Proof: 
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(11) 



Intuitively this means that the probability of obtaining jointly typical sequences, by replacing i codewords 
of the true set with i independent codewords, scales exponentially with the negative of the mutual 
information between i codewords of the set, and the remaining K — i codewords and the output. Before 
stating our main theorem, it is worthwhile mentioning that: In the classical channel coding problem [16], 
the error probability analysis for typical set decoding separates nicely due to the independence of the 
channel output and every codeword other than the truly transmitted one. However, in the group testing 
problem the main difficulty arises from the fact that a collection of K codewords and the true defective 
set could be overlapping. Hence, independence of the output and the collection of codewords does not 
hold anymore. To analyze the error probability, our approach was to isolate the overlap ensuring the 
independence of the remaining codewords from the true defective set and the output. 

Theorem 3.2: Given ./V items with K unknown defectives, the following number of tests T is achievable 
for the noise-free and the noisy models of Eq.©,© and Q, 



Intuitively, the numerator represents the number of bits required to describe how many sets we can form 
with i misclassified items (i out K) and the denominator represents the amount of information if K — i 
of the defective items are known. 

Proof: We can bound the probability of the event Ei by summing over all sets that differ from the 




(12) 



where Xr w -\ denotes a collection of code symbols of size w (a vector) and Y denotes the observation. 



true defective set in i codewords. We can form as many as 

(?) { N ~i K ) such sets - Thus 

using Lemma I3TT1 




(13) 



We can now bound the total error probability for T sufficiently large as : 



Pe < 
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2 -TI(X it y,X (K ^ h Y) 




2 -T(H{Y\X {K ^))-H(Y\X iK) )) 



(14) 



As we pointed out earlier and detailed the proof in the appendix (Lemmas 14. 1 1 16. 1 1 16.31 ). the probability of 
the atypical event P(Eq) is always dominated by the sum of P(Ei). Hence T in Thm. 13.21 is achievable. 

■ 

Main Intuition: The main idea behind the described approach is illustrated in Fig. [3] The group testing 
problem has been mapped to an equivalent multiple channel model. Each channel accounts for the case, 
where K — i of the defective items are recovered and i items are still to be recognized. The capacity 
of each channel contributes partially to the total error probability in decoding the true defective set. 
K — i represents the overlap between the true defective set and a false candidate set. By isolating the 
overlap, independence of the remaining i codewords from the codewords of the true set, helps bounding 
the associated error probability. 

The result is a simple mutual information expression that can be used to determine the tradeoffs between 
K, N, T, and noise for various models as we show in the following sections. Table U summarizes the 
scaling results for the considered models for the average error and worst case error criteria. Note that 
the number of tests increases by 1/(1 — u) 2 factor for the dilution model and only by 1/(1 — q) for the 
additive noise model which matches the aforementioned intuition. 



In this section, we consider the noise-free (deterministic) case: the test outcome Y is 1 if and only if a 
defective item is pooled in that test. Hence Y is give by Eq. [TJ We consider two scenarios: average and 
worst-case error. As mentioned earlier, the former requires the average error to be small and the latter 
considers the worst-case input since bounding the average error probability does not guarantee error-free 
performance for all possible defective sets. 



IV. Noise free case-deterministic output 
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Fig. 3. Equivalent channel model. Each channel accounts for the case, where K — i of the defective items are recovered and i 
items are still to be recognized. The capacity of each channel contributes partially to the total error probability in decoding the 
true defective set. K — i represents the overlap between the true defective set and a false candidate set. By isolating the overlap, 
independence of the remaining i codewords from the codewords of the true set, helps bounding the associated error probability. 



A. Average Error 

Before we state the theorem, the following lemma provides an upper bound on the probability of the 
atypical event for the noise-free case. The proof is provided in the appendix. 

Lemma 4.1: For the noiseless model of Eq.(Q]), V5 > 0, there exists a constant c(S) > such that, 
for all T > c- K log K, the probability of the atypical event P(Eq) (Eqj9j) (i.e., the probability that the 
outcome of the tests and the codewords of the true defective set are not jointly typical) is less than or 
equal to 5. 

Now we state the following theorem: 

Theorem 4.2: For N items and K defectives, the number of tests T = 0(K log N) is sufficient to 
satisfy an average error criterion, i.e., achieve an arbitrarily small average error probability. 

Proof: In the noise free case, H{Y\X^ K ^) = since the outcome of a test is deterministic given 
the knowledge of all the items in the defective set. Recalling that Xu\ represents the code symbols of a 
set of i defective items, the mutual information expression I(X^y, X^ K _^,Y) can be written as: 



IiX^X^^Y) = H(Y\X {K _ t) )-H(Y\X (K) ) 



K-i 



, Mi i 

1 log 5 — : 



(15) 



Thus, for large K, 

Using ThmE2]and the fact that (£) < 2 NH ^ we can bound T as follows: 
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(16) 



The second inequality uses the Taylor expansion of log(l/(l — x)). 
Hence, T = 0(Klog(N)). 

Intuitive explanation: The average error probability is bounded by the sum of probabilities of events 
Ei, where 



P{Ei 



2 -T(H(Y\X (K ^) 
2 -T(l-p) K -*H((l-pY) 



(17) 



To understand the behavior of the average error probability, we study the behavior of the terms above 
for every value of i. We consider the following, i = o(K) and i = O(K). We also choose p = jL 
• First, i = o(K). In this case: 

P{Ei) = 2 llo §f . 2 n °zf . 2 - T ( 1 -^) K - 0< " , "« 1 -^) 0(K) ) 

^ 2 iio g 7V _ 2 ~ TH (i) = 2 il °s JV • 2~ T/3 i? logi r (18) 



where (3 is a constant. Replacing for i = o(K) we could see that this term goes to zero fast for 

T _ n , KlogN \ 
1 — ^\ log K ) 

• Second considering the case where i = 0{K). 

p( Ei )= = 2 NH ^-2- T P (19) 
T = O (if log N) is hence a sufficient condition for these terms. 



B. Maximum Probability of error 

The previous analysis considered the average error case. Maintaining the average error probability 
below e is not enough if we are interested in the worst-case input, i.e. maximum error case. For exact 
reconstruction, the worst-case error is required to be zero. 

Theorem 4.3: For N items and K defectives, T = 0(K 2 log N) is achievable for exact reconstruction 
(worst-case error criteria). 

Proof: Since the average error probability is below e, then: 

Pe = Yl Pr I C ] A ( C ) < e -» 3C : A ( C ) < e (2°) 
c 

to say that, since the average probability of error (over codebooks and inputs) is below e, then there exist 
a codeook C such that A(C) = -prr X V {C) < e. Choosing e = guarantees that the worst case error 

\k) \k) 

is also since for the noiseless case A„ G {0, 1}. Driving the error terms to zero requires: 

< * (21) 

Hence T = 0(K 2 logJV). 



V. ACHIEVABILITY WITH DISTORTION 

In this section we relax our goal. We are satisfied with recovering a large fraction of the defective 
items. In other words, we allow an approximate reconstruction [15] in the sense that if K candidates are 
declared, up to aK misses are allowed (a small). 

Theorem 5.1: If N is the total number of items and K the size of the defective set, approximate 
reconstruction of the defective set, i.e. with up to aK misses, is achievable with T = 0( e< H '(f- l °f N )- 



Proof: In this case, to drive the error terms to zero: 



N - K 
aK 



(*>-<>-*>—tf((i4r) 



< 



(2) 



2 oJflo B #. 2 -Te-P-)H( e -) < 1 (22) 

Hence, for large i<C and p= ±,T = e<1 ^-°f N is achievable. ■ 
Remark 5.1: The fact that the achievable bound above scales only with K suggests a multistage 
approach where K items are declared followed by aK items etc... For the second stage, with a 2 K 
distortion, we require T = — — ^" e K }° g — tests. Hence for the multistage, the total number of tests 
required is such that: 

T S H { e-)(l-a) <23) 

VI. Noisy Group Testing 

The derived upper bound in Thm. 13.21 is fairly general as it maps the group testing problem to an 
equivalent channel model. This does not restrict the model to the noise-free scenario and hence could 
also be used to account for different noisy versions of the problem. The question reduces to how easy it 
is to compute the mutual information expression. In this section, we derive sufficient conditions on the 
number of tests for two types of noisy channels. It is to be noted that the result could also be applied to 
other noise models. 

A. Additive Observation Noise 

First, we consider the additive output model of Eqf3] where: 

JV 

Y t = (\/ X ti Ri) V W t 
i=i 

where W is Bernoulli^). This model captures the possibility of probabilistic false alarms. This accounts 
for errors in blood tests or background wireless losses [5] etc. 

Again, we first upper bound the probability of the atypical event. See the appendix for a proof of the 
following lemma. 

Lemma 6.1: For the additive model of Eq.©, V<5 > 0, 3c > such that, VT > c-nmx{K log K, j^r}, 
the probability of the atypical event P(Eq) in Eq.© (i.e., the probability that the outcome of the tests 
and the codewords of the true defective set are not jointly typical) is < 5. 



Now we state the following theorem: 

Theorem 6.2: For the additive noise model of Eqj3] N items, K defectives, T = 0( K ^ og q N ) is 
achievable, where q is the parameter of the bernoulli distribution of the binary noise. 
Note: As q increases, the number of tests required to identify the defective set increases since the outcome 
of a pooling test becomes less reliable due to false alarms. 

Proof: Note that for the considered additive model, knowing K — i defective items might not 
completely determine the test outcome. The test outcome Y remains uncertain if their corresponding 
code symbols X^ K __^ are all 0. The test is negative only if the code symbols Xr* for the remaining i 
items, as well as the realization of the Bernoulli noise for that test, are all 0. Thus, 



I{X {i] ;X iK ^,Y) = H(Y\X (K _ i} ) -H{Y\X {K) ) 

= (1 -pf{l - q)) - (l-p) K H(q) 

= (1 - P ) K -\H({1 - pY(l - q)) - (1 - p)*H{q)) 
The first entropy term can be written as: 



(24) 



H((l-py(l-q)) = (l-p)\l-q)log 



(l-pY(l-q) 
1 



+ (l-(l-p) i (l- g ))log 



1-p 



1 



+ l 1 -^ ) (l-<?)log 



l-(l-p)*(l 

1 



00 1 

+ E-r 



1 

m~2 



jln2 



1 V 1 

1 -k) 



31 



:i - q) j 



The second entropy term in Eq.d24l) can be written as: 



{l-pfH{q)= 1 



1 



I f l\ 1 

' 1_ k) (1_(?)log r3^ 



(25) 



(26) 



K J ~ - q 

Subtracting Equations d25l) and d26l ). and multiplying by (1 — p) K ~ l the mutual information expression 
simplifies to: 



K-i 



--(y— 

In 2 4^ id ~ 1 



if 



iv , i 

iry g 



> 



Now note that 



since: 



(1-g) 2 (1-g) 3 _ 



Replacing in Eq.(|27T) we get: 



V 1 K 



K 



(i - g) 3 ' 

7'f? - 1) 



3=0 



K 



1^ , 1 

#7 5 



(27) 



i=o 



1 

9 



(1 - g) 2 (l -!) + (!- ^^(I - |) + . • • 



(l-<?) 5 



1 



+ 



(1 



+ ... 



(1 - g) log - - (log - - (1 - q)) 



1 — q — q In 2 log ■ 



1 



^(^(i);^(if-i);^) 



> 



1 



1 



K\n2 \ K 



K 



(1-8) 



(28) 



(29) 



Hence, T < (e ln 2 >* log N , i.e., T = O(^A) is achievable. 



Remark 6.1: Following the same argument in Section ITV-B I it is not hard to see that the same scaling 
holds for the worst-case error criteria but replacing K with K 2 as shown in table U In this noisy setup, 
this would mean that the worst-case error probability goes to zero in contrast to the noise-free case where 
this scaling ensures exact recovery. 



B. Dilution 

The second noisy model we consider is the "dilution" model. Positive items taking part in a given test 
might probabilistically behave as absent (diluted). If all positive items in a given test appear as absent 



that could potentially lead to erroneously zero outcomes. This model is motivated by blood dilution due 
to pooling with other negative tests, or imperfectly diluted blood samples, or adversarial camouflage-in 
the form of probabilistic transmission-in communication systems [5]. This is captured by the Z-channel 
model of Fig. [2] For this case, we show that T = 0( ^°^ ) is achievable. Intuitively, a larger flip 
probability u implies that more items will get diluted. As the tests become less reliable, a larger number 
of tests is needed to identify the defective set. 

First, it is shown that the probability of the atypical event is dominated by the probability of the other 
error events (see Eqj9]), which follows from the following lemma. 

Lemma 6.3: For the dilution model of Eq.©, V<5 > 0, 3c > such that, VT > c • K\og 2 K, the 
probability of the atypical event (Eqj9]) P{Eq) < 5. 

Proof: See appendix ■ 

Theorem 6.4: Considering N items, K defectives and the dilution model represented by the Z-channel 
in Fig.© and Eq.©, T = j^^w is achievable, where u is the transition probability of the z-channel 
(i.e. the probability that 1 is flipped into 0). 

Proof: For notational convenience we let u = 1 — s. Recalling that Xr^ refers to the code symbols 
of a set of i defective items, the mutual information can be written as: 



I{X^X (K ^Y) = - H{Y\X {K) ) 



K-i 

E 

3=0 
K 



K-i\ ( 1 



K 



K 



K-i-j 



\£=0 



K 



The first sum, i.e. H(Y\X( K _^), can be written as: 

^ f K - A / 1 \ j f 1 \ K-i-j 
H(Y/X (K _ t) ) = £ / 



j=0 



J 



K 



1 



K 



H ((i- s y(i--y 



K-i 

= £ 

3=0 
K-i 

+ £ 

j=0 



r ±Y d I} 1 



K 



K 



K 



MV'/ I} 1 



J 



K 



K 



l_(l-^(l_-)»)log 



(30) 



rji + r) 2 



1- (1-^(1-^)* J 

(31) 



Notice that: 



K-i 

E 

3=0 



K-i 



K 



1 



1 



K-i 
3=0 



K-i 
3 



i<)'i l -i< 



K 

K-i-j 



K-i-j 



(i - a y 



H- S y = {i--) 



^{K-i){l-s){l-^) K -^ 



Using Eq. (132b and (1331) . the first term r}\ in Eq. (1311) simplifies to: 



(32) 



(33) 



s, K . ,1,1 



m = i(i-^r iog( 1 



K y JK K 



1 - s 



Equivalently, the second sum, i.e. H{Y/Xt^) simplifies to: 



K 



3=0 
K 



1 



= e 1 + e 2 

From Eq.(l33l. #i simplifies to: 



1 



i = -^A(l- S )(l 



\K-l 



K 



log 



1 - s 



Combining 771 and 0\ we get: 



(34) 



(35) 



(36) 



s s , 1 
(l--)log 



K' 1 



K 



1 / 1 

— (1 -s log- 

A 1 — S 



_L(i_ ±)^-i 

ln2 l K } 
ln2 l A j 



| + 0(i,)-i(l- s) ( s + i r + ...; 



K (1 -2 ) + ^ ( 2"3 ) + - 



> (1 — v w (37) 

- 2Aln2 V K V 
Now we look at the remaining terms, i.e. 772 and 62. It is sufficient to show that the difference 772 — 02 
is > 0: 



K-i 

E 

3=0 



K - i\ ( 1 
if 



A' 



K-i-j 



1-(1- S )i(l-A) 



K 



K-i-j 



if 



(38) 



Using Eq.(l32l) 772 simplifies to: 



'/2 



1 

m~2 
1 

ln2 



K—i-j / 00 



V - a - s) ej (1 - —Y 



K' 



K- 



(39) 



Similarly, 



In 2 
1 

m~2 



K 



l-s 



K 



(40) 



Comparing equations (1391 and (1401) it is now clear that for large K, 772 — 02 > 0. This is easy to verify 
since: 



1 1 , x9 

1_ K + K (1 ~ s) 



Now, 



, 2s s 2 

1 1 

K K 



s y 



_ 2s a2_ 



Thus, T = 0( Kl °i N ) is achievable. Replacing for s = 1 - u, Thm. El follows. 

Remark 6.2: Following the same argument in Section ITV-B I it is not hard to see that the same scaling 
holds for the worst-case error criteria but replacing K with K 2 as shown in table U 



VII. Lower Bound: Fano's inequality 

In this section we also derive lower bounds on the required number of tests using Fano's inequality. 
Theorem 7.1: For N items and K defectives, a lower bound on the total number of tests required to 
recover the defective set is given by: 



- I{X {K) -Y) 

Proof: The average error probability can be lower bounded using Fano's inequality: 



log o 

log© 
W J(Xg, ); ^)-l 

" " log © 

(42) 

where (a) follows from the data processing inequality. Hence bounding the error probability requires: 

log(^) < I(XJ K)] Y?) 

T 

( = } J2 H ( Y (Wt 1 )-H(Y(t)\X {K) (t)) 
t=i 

< j2H{Y{t))-H{Y(t)\X {K) {t)) 
t=i 

T 

= J2l(X {K) (t);Y(t)) 
t=i 

< TmaxI(X {K) ;Y) (43) 

(a) is due to the fact that tests are memoryless and (b) is true since conditioning reduces entropy. Hence 

log( N ) 

it is necessary that T > — 

Remark 7.1: According to this lower bound, it is not hard to see that for the noise free case T > 
K log N is a necessary condition on the total number of tests. 

Remark 7.2: It is straightforward to verify that the lower bound depends on the minimum term in the 
upper bound summation since: 

?r^)(^) 2 ^ /(X( * ,;x "" ,,,Y) - E( N ~ K )( K y Ti{X(Kh¥) 

^) 2 ~ T/(x<K> ' y) (44) 

(1) is true since conditioning reduces entropy which ensures that, 

I(X i; X M ,F) = H{Y/X {K ^) - H(Y/X (K) ) < H(Y) - H(Y/X [K) ) = I(X {K) ;Y) 



It is clear that the number of tests required to let the R.H.S. of Eq. (l44l ) go to zero matches the derived 
lower bound, i.e. T > T )°3 ^ K \, . 

VIII. Conclusions 

In this paper, we adopted a new information theoretic framework to address group testing problems. This 
approach shifts the philosophy of random disjunct/separable matrix generation to an equivalent channel 
model and capacity computation. The result is a fairly general achievable bound that enables us to obtain 
the required tradeoffs between the number of tests, number of items and number of defective items for 
a wide range of group testing problems. Obtaining these tradeoffs reduces to a simple computation of 
mutual information expressions. We obtain the asymptotic scaling for i) noise-free setups with average 
and worst-case errors; ii) Approximate reconstruction and iii) Noisy versions of group testing, namely 
additive and dilution models. 
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Appendix 

Proof of Lemma 14.11 

For both the noiseless and the additive noise models, define the typical set A as: 



A={(x K * T ,y T ): 



N{x {i) =0,y = 0) 



N(x {i) =0,y = l) 



P(x(i) = 0,2/ = 0) 
p{x(i) = 0,y = 1) 



< = 0,y = 0), 

< ep(s(i) = 0,y = l),Vi 



N(xi) 



T 



P{xi) 



<ep(x,),x,G{0,l},Vl} 



(45) 



where N{.) is the number of occurrences. Now we show that the probability of being atypical is bounded. 
The probability that the matrix of codewords corresponding to the true defective set and the tests outcomes 
are not jointly typical can be bounded as: 



(a) 

P(A C ) < Pr 



Pr 



N(x (l) =0,y = 0) 



N(x (l) =0,y = l) 



+ Pr 

(<>) 



N{x t ) 



T 



p(xi) 



P(x(i) =0,y = 0) 
P{x(i) = 0,y = 1) 



> ep(xu\ = 0, y = 0) for some i 

> ep(x(j) = 0, y = 1) for some i 



> ep(xi) for some I, x\ € {0, 1} 



+ 



§ 2 K • 2- T P( x (O=0'3/=0)^ 2 + ^ 2 -Tp(x (l) =0,j/=l)e 2 + ^ . ( 2 -Tp(0)e 2 + 2 -Tp(l 



)(46) 



where (a) is obtained by applying the union bound, (b) follows from the union and Chernoff bounds. 
Now consider the noise free model. In this case, 



p(x (0 =0,y = 0)= (l-^J 



K 



P(x(i) = 0,y = 1) 



1 

1 



1 



1 

1 

K 



K—i 



Also note that in the noise free case the index i in the summation runs from 1 to K — 1 since the 
K-t\\ term is always zero. This is true because the boolean sum of an all zero vector cannot be 1. Hence, 
there always exists a constant c\ > such that the first term in Eqj46] is arbitrarily close to zero for 
all T > 2 (^ h L)K > i- e -> as K — > oo, VT > const * K. Similarly, each term in the second summation is 
arbitrarily small for all T > C2K\ogK for some constant c^. Also 3c% > such that the last term is 



arbitrarily small VT > c^K log K since p(0) 



A* ' 



Proof of Lemma 16.11 

Now consider the additive noise model of Eq.©. In this case, 



p(x {i) = 0,y = 0) = (l-q) (l-^J 



K 



l-^) (l-(l-g)fl-^ 



K 



K- 



Thus, the first term in Eq. 06] for this additive model is arbitrarily small for all T > j^- with a proper 
choice of a constant c\. For all T > C2K\ogK, the terms in the summation of the second term are 
arbitrarily small since the probability p{xu\ = 0, y = 1) in the exponent of each term is in fact greater 
than for the noise free case. 

Now we verify that the set A as defined provides us with the desired entropic properties. For the noise 
free and additive noise model we observe that for any set Xu\ we can write the joint entropy as: 



H ( X (i), Y ) = - ^2^2 P(x(i),y) log p(x {i) ,y) 

xa) y 

= ~p(x(i) =0,y = 0) logp(x (i) = 0, y = 0) - p(x (i) = 0, y = 1) logp(x (i) = 0, y = 1) 

(47) 

where (1) follows from the fact that having any of the entries of the vector Xu-\ equals to 1 would make 
y deterministicaily 1 for these 2 models. Now we write the empirical joint distribution, 

-j, 2^ l °SP( x (i),t, Vt) = 7f logp(x (i ) = 0, y = 0) 

t=l 

- Ar(X(l)= T °' j/ = 1) logp(x w = 0, y = 1) (48) 

Comparing Eqs|47]and|48]it is clear that the set A ensures that the empirical joint distributions converge 
to the true joint entropies, in other words, for any (x KxT ,y) G A, p(x lxT ,y) sa 2~ TH ^ X ^' Y ^ for 
any sub-matrix of i rows. Note that given the last conditions in the definition of the typical set A, 
it is straightforward that the empirical marginal distribution — ^logp(x* xT ) approximates the entropy 



Proof of Lemma | 

The dilution case is more involved and requires a different definition for the typical set to ensure that 
the entropic properties are all met. We adopt the conventional definition of weak typicality [16] with an 
extra constraint that the number of ones per column of x KxT (the t-th column is denoted x^), does not 
exceed C^log-fT). More specifically the typical set A is defined as, 



A=l(x** T ,v r ) 



~logp(x {i} ,y)-H(X {{) ,Y) <eH(X (l) ,Y) Vi, 
xfcl^OQogK), Vtl (49) 



Now let 



z ijt(y) = (^{ixy^j^y} - y^)p(j,y)) iogp(j,y),y e {0,1} 

where, 1/ \ is an indicator function and 



U\3 I 1 \ v K — l\ U\ l 1 



and, 

Note that p(j,y) corresponds to an instance where a column has exactly j ones. The first set of conditions 
in the definition of the typical set is equivalent to: 



1 T 



t=i 



<eH(X (l) ,Y),Vi 



where Zi t (y) = Yl)=o Zijt(y),y G {0, 1}. The second condition in the definition of the typical set means 
that being typical requires the number of ones per column to be upper bounded by log K. 

For a fixed i, the random variables (a = Zi t (0) + Za(V) are independent with zero mean. Thus, if 
[Ctt| ^ tnen Bernstein's inequality implies that: 



PrECft >0\ <exp 
t=i 



pi/2 



Z t E[Q]+Mf3/3 



(50) 



Since \Z it (y) | < (— log minp(j, y)), and the total number of ones in the measurement matrix is bounded 
by log-ftT, then M = 0(log 2 K). Furthermore, it is not hard to verify that the variance of (a is 0((^) 2 )- 
Replacing in Eq.(l50l we can see that: 



Pr 



ECit >eH(X {i) ,Y) 



t=l 



< Pr 



E iu K 
( it > e— log — 
K u 

t=l 

,2 - 



< 



The inequality in the opposite direction, i.e., Pr 



cxp 



Y%=i&t<-eH{X u Y) 



-^(riog 2 (f) 

T{f) 2 + Tflog 2 K / 



(51) 



, follows the exact same 



analysis. Now, since the probability of having more than log K ones per column goes to zero exponentially 
fast, then using the union bound we can upper bound the probability of being atypical: 



Pr[A c ] . exp 



K 



f^log^) 



iogin og (^ 



(52) 



which is again arbitrarily close to zero with T > cK log K, where c is a constant. 
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